Text Reduction-Enrichment at WebCLEF

نویسندگان

  • Franco Rojas López
  • Héctor Jiménez-Salazar
  • David Pinto
چکیده

In this paper we are reporting the results obtained after submitting one run to the Mixed Monolingual task of WebCLEF 2006. We have used a text reduction process based on the selection of mid-frequency terms. Although our approach enhances precision, it must be improved in recall by an enrichment process based on the addition of high co-ocurrence terms. We have seen that a improvement of 40% in the corpus used last year in the BiEnEs was obtained. But we also observed that low Mean Reciprocal Rank (MRR) values were obtained compared with those of the mixed monolingual task of WebCLEF 2005. We consider that our low MRR is derived of a bad preprocessing phase, but we must investigate this issue in detail.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Vocabulary Reduction and Text Enrichment at WebCLEF

Nowadays, cross-lingual Information Retrieval (IR) is one of the greatest challenges to deal with. Besides, one of the most important issues in IR consists in the corpus vocabulary reduction in order to make possible to use in real situations some methods of IR such as the well-known vector space model. In this work, we have considered a vocabulary reduction process based on the selection of mi...

متن کامل

BUAP-UPV TPIRS: A System for Document Indexing Reduction at WebCLEF

In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system at the bilingual “English to Spanish” task. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the perform...

متن کامل

TPIRS: A System for Document Indexing Reduction on WebCLEF

In this paper we present the results of BUAP/UPV universities in WebCLEF, a particular task of CLEF 2005. Particularly, we evaluate our information retrieval system in the bilingual English to Spanish track. Our system uses a term reduction process based on the Transition Point technique. Our results show that it is possible to reduce the number of terms to index, thereby improving the performa...

متن کامل

Multilingual Web Retrieval Experiments with Field Specific Indexing Strategies for CLEF 2006 at the University of Hildesheim

For WebCLEF 2006 we experimented with the analysis and extraction of the HTML structure of the web documents. In addition, blind relevance feedback was applied in the search process. As in 2005, the experiments were carried out with a language independent indexing strategy. We experimented with HTML title, H1 element and other elements emphasizing text. Our index contained title and H1, emphasi...

متن کامل

The Impact of Input Enrichment in Long Text vs. Short Texts on Grammatical Accuracy in Writing Among Elementary Language Learners

This study was conducted to investigate the influence of teaching accurate grammar inwriting via enriched long text and short text for the elementary students atShokouhe_Farhang institute. The homogenized subjects were divided into two groups of 18and 17 participants. Using a writing exam as a pretest in order to check the students’knowledge in English past tense. The control group received the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006